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Abstract 



and hence the need for copying of features struc- 



tures (Shieber, 1985; Pereira 



processed in polynomial time by exploit- 
ing constraints which make possible the 
extensive use of structure-sharing. This 
paper describes a formalism that is more 
powerful than LIG, but which can also be 
processed in polynomial time using similar 
techniques. The formalism, which we re- 
fer to as Partially Linear PATR (PLPATR) 
manipulates feature structures rather than 
stacks. 



1 Introduction 

Unification-based grammar formalisms can be 
viewed as generalizations of Context-Free Gram- 
mars (CFG) where the nonterminal symbols are 
replaced by an infinite domain of feature struc- 
tures. Much of their popularity stems from the way 
in which syntactic generalization may be elegantly 
stated by means of constraints amongst features and 
their values. Unfortunately, the expressivity of these 
formalisms can have undesirable consequences for 
their processing. In naive implementations of unifi- 
cation grammar parsers, feature structures play the 
same role as nonterminals in standard context-free 
grammar parsers. Potentially large feature struc- 
tures are stored at intermediate steps in the compu- 
tation, so that the space requirements of the algo- 
rithm are expensive. Furthermore, the need to per- 
form non-destructive unification means that a large 
proportion of the processing time is spent copying 
feature structures. 

One approach to this problem is to refine pars- 
ing algorithms by developing techniques such as 
restrictions, structure-sharing, and lazy unification 
that reduce the amount of structure that is stored 
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Tomabechi, 1991; Harrison and Ellison, 1992)). 
While these techniques can yield significant improve- 
ments in performance, the generality of unification- 
based grammar formalisms means that there are 
still cases where expensive processing is unavoidable. 
This approach does not address the fundamental is- 
sue of the tradeoff between the descriptive capacity 
of a formalism and its computational power. 

In this paper we identify a set of constraints that 
can be placed on unification-based grammar for- 
malisms in order to guarantee the existence of poly- 
nomial time parsing algorithms. Our choice of con- 
straints is motivated by showing how they general- 
ize constraints inherent in Linear Indexed Grammar 
(LIG). We begin by describing how constraints inher- 
ent in LIG admit tractable processing algorithms and 
then consider how these constraints can be general- 
ized to a formalism that manipulates trees rather 
than stacks. The constraints that we identify for 
the tree-based system can be regarded equally well 
as constraints on uni fication-based grammar for- 
malisms such as PATR ( Bhieber, 1984 ). 



2 From Stacks to Trees 

An Indexed Grammar (IG) can be viewed as a CFG 
in which each nonterminal is associated with a stack 
of indices. Productions specify not only how non- 
terminals can be rewritten but also how their as- 
sociated stacks are modified. LIG, which were first 



described by Gazdar (198S), are constrained such 
that stacks are passed from the mother to at most a 
single daughter. 

For LIG, the size of the domain of nonterminals 
and associated stacks (the analogue of the nonter- 
minals in CFG) is not bound by the grammar. How- 
ever, Vijay-Shanker and Weir (1993) demonstrate 



that polynomial time performance can be achieved 
through the use of structure-sharing made possible 
by constraints in the way that LIG use stacks. Al- 
though stacks of unbounded size can arise during 
a derivation, it is not possible for a LIG to specify 
that two dependent, unbounded stacks must appear 
at distinct places in the derivation tree. Structure- 
sharing can therefore be used effectively because 
checking the applicability of rules at each step in 
the derivation involves the comparison of structures 
of limited size. 

Our goal is to generalize the constraints inher- 
ent in LIG to a formalism that manipulates feature 
structures rather than stacks. As a guiding heuris- 
tic we will avoid formalisms that generate tree sets 
with an unbounded number of unbounded, depen- 
dent branches. It appears that the structure-sharing 
techniques used with LIG cannot be generalized in a 
straightforward way to such formalisms. 

Suppose that we generalize LIG to allow the stack 
to be passed from the mother to two daughters. 
If this is done recursion can be used to produce 
an unbounded number of unbounded, dependent 
branches. An alternative is to allow an unbounded 
stack to be shared between two (or more) daughters 
but not with the mother. Thus, rules may mention 
more than one unbounded stack, but the stack as- 
sociated with the mother is still associated with at 
most one daughter. We refer to this extension as 
Partially Linear Indexed Grammars (PLIG). 

Example 1 The PLIG with the following produc- 
tions generates the language 



a A[a] 




b B[a] c C*[cr] d D[a] 




bed 
Figure 1: Tree set for { a n b m c n d m \ n, m > 1 } 

where k > 1. 

S[] —> A[x] . . . A[x], A{}->\, 
k copies 

A[x<ji] — > a>l[x], -A[xo"2] — ► bA[x\. 



{a n b 



m c n d m \ n,m 



>1} 



and the tree set shown in Figure [J. Because a sin- 
gle PLIG production may mention more than one un- 
bounded stack, variables (x, y) are introduced to dis- 
tinguish between them. The notation A[xa] is used 
to denote the nonterminal A associated with any 
stack whose top symbol is a. 



A[x] — ► aA[x<7], A[x] - 

B[xa] -> bB[x], B[a] 

C[xa]^cC[x], C[a] 

D[xa] -> dD[x], D[a] 



B[y]C[x]D[y], 



b. 

c, 
d. 



Example 2 A PLIG with the following productions 
generates the k-copy language over {a, b}* , i.e., the 
language 

{w k \w G {a,b}* } 



Example 3 PLIG can "count" to any fixed k, i.e., 
a PLIG with the following productions generates the 
language 

{a n x ...at\n>Q} 

where k > 1. 

S[] ^^![x]...A fe [x], 

Ai[xa] -> ai Ai[x], AiQ-vA, 
A k [xa]^a k A k [x], A k [] — > A. 



In PLIG, stacks shared amongst siblings cannot be 
passed to the mother. As a consequence, there is 
no possibility that recursion can be used to increase 
the number of dependent branches. In fact, the num- 
ber of dependent branches is bounded by the length 
of the right-hand-side of productions. By the same 
token, however, PLIG may only generate structural 




c C[ n ] 



where t\ — o\ and r<+i = aiiji) 
Figure 2: Tree set for { a n b n c n \ n > 1 } 
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Figure 3: Encoding a Turing Machine 

shown in Figure ^| can be generated. The nodes rji 
and r]2 share the tree r n , which occurs twice at the 
node rj2- At 772 the two copies of r n are distributed 
across the daughters. 

The formalism as currently described can be used 
to simulate arbitrary Turing Machine computations. 
To see this, note that an instantaneous description 
of a Turing Machine can be encoded with a tree as 
shown in Figure |3|. Moves of the Turing Machine can 
be simulated by unary productions. The following 
production may be glossed: "if in state q and scan- 
ning the symbol X, then change state to g', write 
the symbol Y and move left" []. 



descriptions in which dependent branches begin at 
nodes that are siblings of one another. Note that 
the tree shown in Figure |^ is unobtainable because 
the branch rooted at 771 is dependent on more than 
one of the branches originating at its sibling 772. 

This limitation can be overcome by moving to a 
formalism that manipulates trees rather than stacks. 
We consider an extension of CFG in which each non- 
terminal A is associated with a tree r. Productions 
now specify how the tree associated with the mother 
is related to the trees associated with the daughters. 
We denote trees with first order terms. For exam- 
ple, the following production requires that the x and 
y subtrees of the mother's tree are shared with the 
B and C daughters, respectively. In addition, the 
daughters have in common the subtree z. 

A[<ro(x,y)]-+ B[a 1 {x,z)) 
C[a 2 (y,z)} 

There is a need to incorporate some kind of gen- 
eralized notion of linearity into such a system. Cor- 
responding to the linearity restriction in LIG we re- 
quire that any part of the mother's tree is passed 
to at most one daughter. Corresponding to the par- 
tial linearity of PLIG, we permit subtrees that are 
not shared with the mother to be shared amongst 
the daughters. Under these conditions, the tree set 



A[q(W(x), A, y)\ -> A[q'(x, W, Y(y))} 

One solution to this problem is to prevent a sin- 
gle daughter sharing more than one of its subtrees 
with the mother. However, we do not impose this 
restriction because it still leaves open the possibility 
of generating trees in which every branch has the 
same length, thus violating the condition that trees 
have at most a bounded number of unbounded, de- 
pendent branches. Figure || shows how a set of such 
trees can be generated by illustrating the effect of 
the following production. 

A[a(a{x, y),a(x',y'))] -» A[a{z, x)] 

A[a{z,y)] 
A[a{z,x')] 
A[a(z,y')} 

To see this, assume (by induction) that all four of 
the daughter nonterminals are associated with the 
full binary tree of height i (t^). All four of these 
trees are constrained to be equal by the production 
given above, which requires that they have identical 
left (i.e. z) subtrees (these subtrees must be the 
full binary tree Ti_i). Passing the right subtrees 
(x, y, x' and y') to the mother as shown allows the 

1 There will be a set of such productions for each tape 
symbol W . 
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Figure 4: Building full binary trees 



construction of a full binary tree with height i + 1 
(Tj +1 ). This can be repeated an unbounded number 
of times so that all full binary trees are produced. 

To overcome both of these problems we impose 
the following additional constraint on the produc- 
tions of a grammar. We require that subtrees of 
the mother that are passed to daughters that share 
subtrees with one another must appear as siblings in 
the mother's tree. Note that this condition rules out 
the production responsible for building full binary 
trees since the x,y,x' and y' subtrees are not sib- 
lings in the mother's tree despite the fact that all of 
the daughters share a common subtree z. Moreover, 
since a daughter shares subtrees with itself, a spe- 
cial case of the condition is that subtrees occurring 
within some daughter can only appear as siblings in 
the mother. This condition also rules out the Turing 
Machine simulation. We refer to this formalism as 
Partially Linear Tree Grammars (PLTG). As a fur- 
ther illustration of the constraints places on shared 
subtrees, Figure || shows a local tree that could ap- 
pear in a derivation tree. This local tree is licensed 
by the following production which respects all of the 
constraints on PLTG productions. 

A[o-i(a 2 (xi,x 2 ,x 3 ),a 3 (x 4l a i ))} -> 
B[a 5 (x 5l x 5 ,xi)} 
C[<76 (ar,Xi)] 
D[a s (x 2l x 3 ,x 5 )} 

Note that in Figure [5] the daughter nodes labelled 
B and D share a common subtree and the subtrees 




Figure 5: A PLTG local tree 

shared between the mother and the B and D daugh- 
ters appear as siblings in the tree associated with 
the mother. 

Example 4 The PLTG with the following produc- 
tions generates the language 

{ a n b n c n | n > 1 } 

and the tree set shown in Figure 

Si [oo] — > A[x] S 2 [o-(x,x)], 
S 2 [a(x,y)} -> B[x) S 3 [y), 
S 3 [x]->C[x], 

A[a 2 (x)] -> aA[x], A[ai] -> a, 

B[a 2 (x)} -> bB[x], B[ax]^b, 

C[a 2 (x)} -» cC[x], C[ai) -> c. 



Example 5 The PLTG with the following produc- 
tions generates the language of strings consisting of 
k copies of strings of matching parenthesis, i.e., the 
language 

{w k \ weD} 

where k > 1 and D is the set of strings in {(,)}* 
that have balanced brackets, i.e, the Dyck language 
over {(,)}• 

S[] -> A[x] . ..A[x], A[] —> A, 
k copies 

A[a x {x)] (A[x] ), A[a 2 (x, y)] - A[x] A[y]. 



3 Trees to Feature Structures 

Finally, we note that acyclic feature structures with- 
out re-entrancy can be viewed as trees with branches 
labelled by feature names and atomic values only 
found at leaf nodes (interior nodes being unlabelled). 
Based on this observation, we can consider the con- 
straints we have formulated for the tree system PLTG 
as constraints on a unification-based grammar for- 
malism such as PATR. We will call this system Par- 



tially Linear PA I K (PLPAI Kj. Having made the move 
trom trees to teature structures, we consider the pos- 
sibility of re-entrancy in PLPATR. 

Note that the feature structure at the root of a 
PLPATR derivation tree will not involve re-entrancy. 
However, for the following reasons we believe that 
this does not constitute as great a limitation as it 
might appear. In unification-based grammar, the 
feature structure associated with the root of the tree 
is often regarded as the structure that has been de- 
rived from the input (i.e., the output of a parser). As 
a consequence there is a tendency to use the gram- 
mar rules to accumulate a single, large feature struc- 
ture giving a complete encoding of the analysis. To 
do this, unbounded feature information is passed up 
the tree in a way that violates the constraints devel- 
oped in this paper. Rather than giving such promi- 
nence to the root feature structure, we suggest that 
the entire derivation tree should be seen as the ob- 
ject that is derived from the input, i.e., this is what 
the parser returns. Because feature structures asso- 
ciated with all nodes in the tree are available, feature 
information need only be passed up the tree when it 
is required in order to establish dependencies within 
the derivation tree. When this approach is taken, 
there may be less need for re-entrancy in the root 
feature structure. Furthermore, re-entrancy in the 
form of shared feature structures within and across 
nodes will be found in PLPATR (see for example Fig- 
ure ||). 

4 Generative Capacity 

LIG are more powerful than CFG and are known to 
be weakly equivalent to Tree Adjoining Grammar, 
Combinatory Categorial Grammar, and Head Gram- 



mar ( Vijay-Shanker and Weir, 1994). PLIG are more 
powerful than LIG since they can generate the fc-copy 
language for any fixed k (see Example |J). Slightly 
more generally, PLIG can generate the language 



{ 



w 6 



R} 



for any k > 1 and regular language R. We be- 
lieve that the language involving copies of strings 
of matching brackets described in Example |B| cannot 



be generated by PLIG but, as shown in Example |^, 
it can be generated by PLTG and therefore PLPATR. 
Slightly more generally, PLTG can generate the lan- 
guage 

{ w k \w € L) 

for any k > 1 and context-free language L. It ap- 
pears that the class of languages generated by PLTG 
is included in those languages generated by Linear 
Context-Free Rewriting Systems (Vijay-Shanker et 



al., 1987) since the construction involved in a proof 



of this underlies the recognition algorithm discussed 
in the next section. 

As is the case for the tree sets of IG, LIG and 
Tree Adjoining Grammar, the tree sets generated 
by PLTG have path sets that are context-free lan- 
guages. In other words, the set of all strings labelling 
root to frontier paths of derivation trees is a context- 
free language. While the tree sets of LIG and Tree 
Adjoining Grammars have independent branches, 
PLTG tree sets exhibit dependent branches, where 
the number of dependent branches in any tree is 
bounded by the grammar. Note that the number 
of dependent branches in the tree sets of IG is not 
bounded by the grammar (e.g., they generate sets of 
all full binary trees). 

5 Tractable Recognition 

In this section we outline the main ideas underlying 
a polynomial time recognition algorithm for PLPATR 
that generalizes the CKY algorithm ( Kasami, 1965| ; 
Younger, 1967). The key to this algorithm is the 



use of structure sharing techniq ues similar to those 
used to pro cess LIG efficiently ( Vijay-Shanker and 
Weir, 1993| ). To understand how these techniques 



are applied in the case of PLPATR, it is therefore 
helpful to consider first the somewhat simpler case 
of LIG. 

The CKY algorithm is a bottom-up recognition al- 
gorithm for CFG. For a given grammar G and input 
string ax . . . a n the algorithm constructs an array P, 
having n 2 elements, where element P[«, j] stores all 
and only those nonterminals of G that derive the 
substring Oj . . .a,j. A naive adaptation of this al- 
gorithm for LIG recognition would involve storing 
a set of nonterminals and their associated stacks. 
But since stack length is at least proportional to the 
length of the input string, the resultant algorithm 
would exhibit exponential space and time complex- 
ity in the worst case. Vijay-Shanker and Weir ( 1995 ) 
showed that the behaviour of the naive algorithm 
can be improved upon. In LIG derivations the ap- 
plication of a rule cannot depend on more than a 
bounded portion of the top of the stack. Thus, 
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terminator 
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Figure 6: "Context-Freeness' in LIG derivations 



rather than storing the whole of the potentially un- 
bounded stack in a particular array entry, it suf- 
fices to store just a bounded portion together with 
a pointer to the residue. 

Consider Figure Ej. Tree (a) shows a LIG derivation 
of the substring cii...aj from the object .A [acre']. 
In this derivation tree, the node labelled _B[ao"] is a 
distinguished descendant of the rootf] and is the first 
point below A[ao~o~'] at which the top symbol (<r) of 
the (unbounded) stack aa is exposed. This node is 
called the terminator of the node labelled -i4[a<7]. It 
is not difficult to show that only that portion of the 
derivation below the terminator node is dependent 
on more than the top of the stack aa. It follows 
that for any stack a'a, if there is a derivation of 



the substring a p 



from Bja'cr] (see tree (b)), 



then there is a corresponding derivation of a.; . . . aj 
from A[a'aa'] (see tree (c)). This captures the sense 
in which LIG derivations exhibit "context-freeness" . 
Efficient storage of stacks can therefore be achieved 
by storing in P[i,j] just that bounded amount of 
information (nonterminal plus top of stack) relevant 
to rule application, together with a pointer to any 



2 The stack aa associated with B is "inherited" from 
the stack associated with A at the root of the tree. 



entry in P[p, q] representing a subderivation from an 
object B[a'a]. 

Before describing how we adapt this technique to 
the case of PLPATR we discuss the sense in which 
PLPATR derivations exhibit a "context-freeness" 
property. The constraints on PLPATR which we have 
identified in this paper ensure that these feature val- 
ues can be manipulated independently of one an- 
other and that they behave in a stack-like way. As a 
consequence, the storage technique used effectively 
for LIG recognition may be generalized to the case of 
PLPATR. 

Suppose that we have the derived tree shown in 
Figure where the nodes at the root of the sub- 
trees T\ and T2 are the so-called /-terminator and ex- 
terminator of the tree's root, respectively. Roughly 
speaking, the /-terminator of a node is the node 
from which it gets the value for the feature /. Be- 
cause of the constraints on the form of PLPATR pro- 
ductions, the derivations between the root of r and 
these terminators cannot in general depend on more 
than a bounded part of the feature structures [T] and 
|~2~| , At the root of the figure the feature structures 





and 



have been expanded to show the extent 
of the dependency in this example. In this case, the 
value of the feature / in [T] must be a, whereas, the 
feature g is not fixed. Furthermore, the value of the 
feature g in 2 must be b, whereas, the feature / 
is not fixed. This means, for example, that the ap- 
plicability of the productions used on the path from 
the root of t% to the root of r depends on the feature 
/ in [T] having the value a but does not depend on 
the value of the feature g in [T]. Note that in this 
tree the value of the feature g in [T] is 



f:c 
9--F 3 



and the value of the feature / in 



F-2 



g : d 



Suppose that, in addition to the tree shown in 
Figure (?] the grammar generates the pair of trees 
shown in Figure 0. Notice that while the feature 
structures at the root of T3 and T4 are not compatible 
with [T] and 2 , they do agree with respect to those 
parts that are fully expanded at r's root node. The 
"context-freeness" of PLPATR means that given the 
three trees shown in Figures [j] and || the tree shown 
in Figure ^ will also be generated by the grammar. 

This gives us a means of efficiently storing the 
potentially unbounded feature structures associated 





f :a 




a ■ Fi 






f ■ F 2 




a ■ 


a ■ b 






GU) (Xq 



a r a s 



Figure 7: Terminators in PLPATR 
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Figure 8: Compatible subderivations 
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Figure 9: Alternative derivation 



with nodes in a derivation tree (derived feature 
structures). By analogy with the situation for LIG, 
derived feature structures can be viewed as consist- 
ing of a bounded part (relevant to rule application) 
plus unbounded information about the values of fea- 
tures. For each feature, we store in the recognition 
array a bounded amount of information about its 
value locally, together with a pointer to a further 
array element. Entries in this element of the recog- 
nition array that are compatible (i.e. unifiable) with 
the bounded, local information correspond to differ- 
ent possible values for the feature. For example, we 
can use a single entry in the recognition array to 
store the fact that all of the feature structures that 
can appear at the root of the trees in Figure |p derive 
the substring a, . . .dj. This entry would be under- 
specified, for example, the value of feature [T] would 
be specified to be any feature stored in the array en- 
try for the substring a p . . . a q whose feature / had 
the value a. 

However, this is not the end of the story. In con- 
trast to LIG, PLPATR licenses structure sharing on 
the right hand side of productions. That is, partial 
linearity permits feature values to be shared between 
daughters where they are not also shared with the 
mother. But in that case, it appears that check- 
ing the applicability of a production at some point 
in a derivation must entail the comparison of struc- 
tures of unbounded size. In fact, this is not so. The 
PLPATR recognition algorithm employs a second ar- 
ray (called the compatibility array), which encodes 
information about the compatibility of derived fea- 
ture structures. Tuples of compatible derived feature 
structures are stored in the compatibility array us- 
ing exactly the same approach used to store feature 
structures in the main recognition array. The pres- 
ence of a tuple in the compatibility array (the indices 
of which encode which input substrings are spanned) 
indicates the existence of derivations of compatible 
feature structures. Due to the "context-freeness" of 
PLPATR, new entries can be added to the compati- 
bility array in a bottom-up manner based on exist- 
ing entries without the need to reconstruct complete 
feature structures. 

6 Conclusions 

In considering ways of extending LIG, this paper has 
introduced the notion of partial linearity and shown 
how it can be manifested in the form of a constrained 
unification-based grammar formalism. We have ex- 
plored examples of the kinds of tree sets and string 
languages that this system can generate. We have 
also briefly outlined the sense in which partial lin- 



earity gives rise to "context-freeness" in derivations 
and sketched how this can be exploited in order to 
obtain a tractable recognition algorithm. 
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